Do Agents Need to Plan Step-by-Step? Rethinking Planning Horizon in Data-Centric Tool Calling
Naoki Otani (Megagon Labs), Nikita Bhutani (Megagon Labs), Hannah Kim (Megagon Labs), Dan Zhang (Megagon Labs), Estevam Hruschka (Megagon Labs)
Architectural Patterns & Composition
An empirical study overturning the assumption that single-step 'think-then-act' planning is the right default for agentic AI: for data-centric tool-calling tasks, full-horizon planning—generating a complete plan before any execution—consistently yields higher accuracy. The finding challenges a foundational design choice in most current agent frameworks.
Presentation
Talk
Paper Session 4: Agent Memory & Planning
Thursday, May 28 · 9:50 AM – 10:00 AM
Bayshore Ballroom
Poster
Thursday, May 28 · 4:30 PM – 6:00 PM
Carmel
Abstract
Explicit planning is a critical capability for LLM-based agents solving complex data-centric tasks, which require precise tool calling over external data sources. Existing strategies fall into two paradigms based on planning horizon: (1) full-horizon (FH), which generates a complete plan before execution, and (2) single-step horizon (SH), which interleaves each action (tool call) with incremental reasoning and observation. While step-by-step execution is a common default under the assumption that eager execution monitoring is necessary for adaptability, we revisit this assumption for well-defined data-centric tasks. Our controlled empirical study isolates planning horizon as the key architectural feature and systematically analyzes the effects of topological complexity and tool robustness on both paradigms. Our experiments across Knowledge Base Question Answering and Multi-hop QA show that FH planning with lazy replanning achieves accuracy parity with SH across varying depths, breadths, and robustness levels, while using 2-3x fewer tokens. These findings suggest that for well-defined data-centric tasks, eager step-wise monitoring is often unnecessary, and full-horizon planning with on-demand replanning can offer a more efficient default.